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ABSTRACT 


This research article presents a novel approach for mining High Utility Itemsets (HUIs) by integrating Genetic 
Algorithm (GA) with SARSA algorithm. It begins by providing a comprehensive overview of GA's 
fundamental principles and operational procedures, followed by an in-depth exploration of SARSA algorithm 
components, supported by diagrammatic representations. The core contribution of this study is the 
introduction of the Intelligent Genetic Algorithm with on-policy Reinforcement Learning (IGA_RLON) 
methodology, which is thoroughly elaborated upon. The effectiveness of IGA_RLON is meticulously 
evaluated in terms of execution time, convergence speed, and the percentage of successfully mined HUIs, 
through comparative analysis with established methods such as IGA_RLOFF, HUPEUMU-GRAM, and 
HUIM-BPSO. This article aims to advance the field of HUI mining by proposing a robust and efficient 


algorithmic framework. 


Keywords: Genetic Algorithm, SARSA Algorithm, Reinforcement Algorithm, High Utility Itemset Mining, 


Data Mining, Control Parameters 


1. INTRODUCTION 


High utility itemset mining (HUIM) is a 
significant task in data mining and machine learning. 
It is concerned with discovering itemsets that are 
highly valuable, based on a utility function that 
measures their usefulness. HUIM is used in a variety 
of domains, such as marketing, healthcare, e- 
commerce, and finance, where the identification of 
valuable itemsets can provide valuable insights for 
decision-making. 


1.1. Genetic Algorithm 


GA is a computational technique inspired by the 
process of natural selection and evolution in biology. 
It is a metaheuristic optimization algorithm that is 
used to find optimal solutions to complex problems. 
GA works by simulating the process of evolution, 
where candidate solutions (individuals) are treated as 
genes and undergo genetic operations such as 


crossover and mutation to produce offspring (new 
solutions) [1]. The fitness of these solutions is 
evaluated using an objective function, and the 
process is repeated iteratively until the optimal 
solution is found. GA has been applied in various 
domains, including engineering, economics, and 
machine learning, due to its ability to handle 
complex and multi-dimensional problems [2]. 


GA has been successfully applied in HUIM 
to efficiently discover valuable itemsets from large 
datasets. In HUIM, the objective is to find itemsets 
with high utility, which is determined by a utility 
function that measures the usefulness of an itemset. 
GA-based approaches for HUIM involve 
representing itemsets as chromosomes, and using 
genetic operations such as crossover and mutation to 
generate new itemsets [3]. The fitness of these 
itemsets is evaluated using the utility function, and 
the process is repeated iteratively until the optimal 
solution is found. 
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The key objective of HUIM is to find itemsets 
with high utility, which is determined by a utility 
function. The utility function measures the 
usefulness or value of an itemset in a given context. 
For instance, in the context of retail sales, the utility 
function may be defined in terms of the profit 
obtained from selling an itemset [4]. In healthcare, 
the utility function may measure the effectiveness of 
a treatment based on the outcomes of a clinical trial. 
In finance, the utility function may be defined in 
terms of the return on investment of a portfolio of 
securities. 


The applications of HUIM are numerous 
and diverse. In marketing, HUIM can help identify 
product bundles that are likely to be purchased 
together, which can increase revenue and customer 
satisfaction [5]. In healthcare, HUIM can aid in 
identifying effective treatments for specific diseases 
based on patient outcomes. In e-commerce, HUIM 
can help identify items that are frequently purchased 
together, which can be used to provide personalized 
recommendations to customers. In finance, HUIM 
can aid in portfolio optimization by identifying 
securities that are likely to provide a high return on 
investment. 


One of the primary challenges in HUIM is 
to efficiently discover high utility itemsets from a 
large dataset. Since the number of itemsets can be 
exponential, it is necessary to develop efficient 
algorithms to identify the most valuable itemsets [6]. 
Several evolutionary based algorithms have been 
proposed for HUIM, including HUPEUMU-GRAM 
and HUIM-BPSO. This current research explores 
about the discovery of HUIs using GA with its 
operators calibrated using SARSA learning. 


1.2. GA Working Procedure 


Figure 1 represent the flowchart of 
workflow process involved in GA. The algorithm 
begins by initializing a population of candidate 
solutions, which are evaluated based on their fitness. 
The fittest individuals are then selected for 
reproduction, with the aim of producing even better 
solutions in the next generation [7]. 


Crossover and mutation are applied to the 
selected individuals to produce new offspring, which 
are then evaluated for their fitness. The fittest 
individuals from the new generation are selected 
again for further reproduction through crossover and 
mutation, and the process is repeated until a stopping 
criteria is satisfied [8]. 
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The stopping criteria may be based on a 
fixed number of iterations, reaching a certain level 
of fitness, or other factors. If the stopping criteria is 
not satisfied, the algorithm loops back to the 
selection stage to continue the process. Once the 
stopping criteria is_ satisfied, the algorithm 
terminates and returns the best solution found. 


Initialize Population 


| 


Evaluate Fitness 


| 


Stop Condition Met? 


Yes No 


{ 


Return Best Solution Select Parents 


| 


Crossover 


| 


Mutate Offspring 


. 


Evaluate Fitness 


Figure 1 : Genetic Algorithm Workflow 


1.3. Crossover Operation 


Figure 2 illustrates the process involved in 
crossover operation of GA. There are two parents 
that contribute genetic material to the offspring. 
Before the crossover operation occurs, a random 
number is generated. If the random number is below 


the crossover rate (Cp ), then a crossover operation 
occurs at a randomly chosen crossover point. 
Otherwise, the offspring are simply copies of their 
respective parents [9]. The fitness of each offspring 
is then evaluated and the resulting offspring are 
generated 
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Parent 1 Parent 2 


Select genetic material Select genetic material 


Generate 
random number 


rand_num 
rand_num 
< crossover_rate 
No Yes 
Select 
No crossover crossover 
point 
End 
Offspring 1 = Parent 1 ‘ : 
Offspring 2 = Parent 2 Create offspring 1 Create offspring 2 
Evaluate offspring 1 Evaluate offspring 2 
End End 


Offspring 1 Offspring 2 


Figure 2 :Crossover Operation 


1.4. Mutation Operation 


Figure 3 shows the process involved in 
mutation operation. For Mutation process, the output 
of crossover operation goes as an input. Mutation 
process is applied on each individual. It takes single 
parent and undergoes mutation with a certain 
probability, determined by a specified mutation rate. 
Before mutation occurs, a random number is 
generated [10]. If the random number is below the 
mutation rate, then a gene is randomly selected for 
mutation, and a new allele is generated for the 
selected. The resulting offspring is then evaluate. If 
the random number is above the mutation rate, the 
parent is simply copied to become the offspring 


Parent 


Select genetic material 


Generate random 
number rand_num 


rand_num < mutation_rate 


No mutation | Select gene to mutate 


End 


Offspring = Parent Create mutated offspring 


End 


Offspring 


Figure 3 : Mutation Operation 


2. LITERATURE SURVEY 


The methodology used in [11] involves the 
development of a Decomposition based on a 
compact Genetic Algorithm (DcGA) for mining 
closed high-utility itemsets (CHUIs) in large-scale 
databases. The process begins with transforming the 
transaction database into a graph network, followed 
by the application of community detection to create 
groups of highly correlated transactions. The 
compact genetic algorithm is then applied to each 
community to find local closed high utility patterns, 
and the results are concatenated to derive global 
closed high utility patterns. This approach aims to 
efficiently mine CHUIs in a limited time and obtain 
a good predictive model for pattern 
recommendation. The methodology also includes a 
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comparison of the proposed DcGA with existing 
pattern mining algorithms in terms of runtime and 
effectiveness analysis, as well as convergence 
performance. 

A novel genetic algorithm (GA)-based approach 
for safeguarding sensitive high utility itemsets 
during utility mining, aiming to minimize 
information loss while protecting critical data was 
designed in [12]. It introduces a flexible evaluation 
function and leverages the downward closure 
property and pre-large concept to accelerate 
chromosome evaluation, reducing database 
rescanning costs. Highlighting the proliferation of 
electronic data and the necessity for privacy- 
preserving techniques, it underscores the importance 
of addressing confidentiality concerns. This GA- 
based strategy represents the first attempt at privacy- 
preserving high utility itemset mining, employing 
transaction insertion for data concealment. It 
underscores the complexities of data mining, 
particularly in managing privacy, and underscores 
the need for efficient algorithms to mitigate these 
challenges. 

The researcher proposed an Ant Colony 
Optimization (ACO)-based methodology for mining 
high-utility itemsets in [13]. It involves leveraging 
the behavior of ant colonies to efficiently explore the 
search space and identify itemsets with high utility 
values in large datasets. The methodology likely 
includes the design of pheromone update rules, 
heuristic information, and exploration-exploitation 
strategies tailored for high-utility itemset mining. 
Additionally, it may incorporate mechanisms for 
handling constraints and optimizing performance 
metrics such as runtime and solution quality. 

The methodology involves the development of 
an evolutionary algorithm that optimizes for both 
frequency and utility simultaneously in [14]. This 
entails creating specialized fitness functions and 
employing Pareto-based optimization strategies to 
efficiently extract 1itemsets meeting both criteria. 
Additionally, the methodology likely includes 
techniques for addressing scalability and efficiency 
concerns when dealing with large datasets. 

An evolutionary approach caled Artificial Bee 
Colony (ABC) algorithm was proposed in [15] to 
discover high utility itemsets. It involves initializing 
artificial bees to explore the solution space 
iteratively. The bees adjust positions to represent 
changes to candidate itemsets, guided by the 
principles of the ABC algorithm. Fitness evaluation 
assesses itemset utility based on predefined criteria. 
Selection mechanisms choose promising itemsets for 
the next iteration. The iterative process continues 
until convergence criteria are met, with performance 
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evaluated based on effectiveness in discovering high 
utility itemsets. 


3. PROPOSED METHODOLOGY 


In the proposed IGA_RLon approach, the GA control 
parameters namely Crossover and Mutation 
operators are calibrated intelligently using the action 
chosen from the SARAS learning algorithm. Later, 
IGA_RLon was used to mine the high utility itemset 
from the benchmark dataset 


3.1. Methodology of SARSA 


A typical methodology involved in SARSA learning 
is illustrated in the Figure 4. Below are the steps 
involved in SARSA learning. 
1. Initialize the Q(s,a) table with arbitrary values 
for all possible state-action pairs. 
2. Select an action (@) using an €— greedy 
policy, which means that there's a chance of 
selecting a random action with probability € (e.g., 
10%), and selecting the action with the highest Q- 
value with probability l—¢«. 
3. Perform the selected action ‘ @’ and observe the 


reward ‘ R’ and the next state ‘s''’. 

4. Select the next action ‘a@'’ for the next state ‘ 
s'’ using the same €— greedy policy. 

5. Update the O(s,a) value for the current state- 
action pair using the SARSA update rule as in the 
Eq. 1, which is: 


Target Policy as 
Behaviour Policy 


——_ 
O(s,a) = O(s,a)+al[R+vO(s',a')— O(s,a)] 
. ‘ pre 7 - id : - chins 7 (1) 


Target Q val 
arg value Value 


— 
Updated Q 
Value Value 


where @ is the learning rate, 7 is the discount 
factor, and O(s',a') is the Q-value for the next 
state-action pair. 

6. Set the current state to the next state ‘'’ and 


the current action to the next action ‘@'”’. 
7. Repeat steps 2-6 until the algorithm converges 
to the optimal Q-values for all state-action pairs. 


D2. Design of IGA_RLon 


Figure-3 illustrate the architecture design of 
IGA_RLon algorithm. Initially, the itemsets are 
represented as binary chromosomes of population. 
Roulette wheel selection (RWS) strategy is used to 


a aaa en 
3841 


Journal of Theoretical and Applied Information Technology 


oe 
15" May 2024. Vol.102. No 9 Ww 
© Little Lion Scientific fo 
wri iia 


ISSN: 1992-8645 


select parent chromosomes from the population. 
State set S; is calculated from the parent 
chromosomes. An appropriate action should be 


chosen based on &€-— greedy action selection 


scheme. From the action chosen, the Crossover rate 
(Cp) and Mutation Rate (Mpg) are calibrated 
intelligently. 


GA with Intelligent 
Control Parameter 


CALCULATE 
Ss : 


CALIBRATE GA CONTROL 
PARAMETERS 
(Cp AND Mp) 


INITIAL SELECT 
CHROMOSOM CHROMOSOMES 


(HUIs) USING RWS 
’ 


‘ 
Next Episode 
' 


PERFORM 
CROSSOVER AND 
MUTATION 


FITNESS 
EVALUATION 


Y ». wn, av 
ae, ofa y fvvuww  <&#-2 Coe 
weaves’ >STATE Pete 67°02 od ORIEN PELERG 
o “2 » REWARD. i STATE 
vmPACTION 1, STATE? ‘PER Mn Eee ed L  itzee eatae 
RPDS Mies ad td 
- = - = REWARD s ay os 


Q-TABLE 
ACTION 


Q-VALUE 
Pas 


Reinforcement 
Learning 


Figure 4 : Architecture Design of IGA_RLon 


Now the chromosomes are updated using (Cp) and 
(Mp) and the fitness of chromosomes are evaluated. 
Finally the chromosomes having fitness greater than 
the minimum utility threshold are added to HUI list. 
Parallelly the Q-value and Q-Table are updated 
based on SARSA learning. 


3s Design Methodology 


IGA_RLon starts with initializing the Ps, 
Tmax and t and generate representation for 
chromosomes. Fitness of the population is calculated 
using the Eq. 2 and calculate the state using the Eq. 
3. Choose the action to be taken using the Eq. 5 and 
calculate the reward for current action using the Eq. 
4. Update the Q-value and Q-table using Eq. | and 
calibrate the GA control parameter namely Cp and 
Mp by choosing appropriate values from the Table 
4.4 using the current action @. Perform crossover 
and mutation operation using the updated Cp and Mp 
values. Calculate the fitness of the updated offspring 
individual and if it is greater than min_ util then 
add it into HUI list. Replace the existing parent 


individual with new offspring individual in the 
population. Repeat the above process until 
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termination condition is reached. Figure 5 illustrate 
above process by pictorial representation. 


Initialize the Population Size 
(Ps), Max Iteration (Tmax), 


Current Iteration (t=0) 
Genetic Algorithm 


Generate bit map 
representation of random 
initial chromosome 


Calculate the fitness of the 
population 


Calculate the state 
alculate the reward 


No 
Add to HUls list 
No 


Output the result 


Figure 5 : Design Methodology of IGA_RLon 
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4. EXPERIMENTAL EVALUATION 
Performance metrics namely Execution time, 
convergence speed and discovered HUIs are used to 
measure the delivery of IGA _RLon. The 
performance of proposed IGA _RLown approach is 
compared with HUPEum-GRAM, HUIM-BPSO and 
IGA_RLorr. These algorithms are applied on 
standard dataset namely chess, accident_10%, 
mushroom and connect from SPMF repository. 


4.1. Execution Time 

Figure 6 to 9 represents the execution time taken by 
the IGA_RLon, HUPEum-GRAM, HUIM-BPSO 
and IGA_RLorr to mine the HUIs from the chess, 
accident 10%, mushroom and connect datasets are 
measured and a graph is plotted by taking 


Irene nce ca a eT 
3842 


Initialize the RL State Set (Ss) 
and Action Set (As) 


Select an action using e- 


Update the Q-value and 
-Table 


alibrate the Mp and Cp using 


Journal of Theoretical and Applied Information Technology 


oe 
15" May 2024. Vol.102. No 9 Ww 
© Little Lion Scientific Ge 
wri 


ISSN: 1992-8645 


E-ISSN: 1817-3195 


min_util along the x-axis and execution time in 


seconds along y-axis. 


Execution time w.r.t variant of minimum utility 
threshold (Chess Dataset) 


3300 ai ea a 
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minimum utility threshold (min_util) 


—@— HUPEumu-GRAM _ =-""s HUIM-BPSO — *# — IGA_RLOFF --¥=-- IGA_RLON 


Figure 6 : Execution Time for Chess dataset 


The main inference from the Figure 6 is that the 
proposed IPGA_RLon reduces the execution time 
required to mine the HUIs from chess dataset by 
12.52% and 4.05% when compared with HUPEum- 
GRAM and HUIM-BPSO respectively. On the other 
side, IGA_RLon takes 6.74% more execution time 
when compared with the IGA_RLorr to mine HUIs 
from chess dataset. 


Execution time w.r.t variant of minimum utility 
threshold (Accident_10% dataset) 


12.6 12.8 13 13.2 13.4 
minimum utility threshold (mini_util) 
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Figure 7 : Execution Time for Accident_10% dataset 


Execution time w.r.t variant of minimum utility 
threshold (Mushroom Dataset) 
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Figure 8 : Execution Time for Mushroom dataset 


Execution time w.r.t variant of minimum 
utility threshold (Connect Dataset) 
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Figure 9 : Execution Time for Connect dataset 


Figure 7 shows that the proposed IGA_ RLon reduces 
the execution time required to mine the HUIs from 
accident 10% dataset by 12.44% and 4.43% when 
compared with HUPEumu-GRAM and HUIM-BPSO 
respectively. On the other side, IGA_RLon takes 
2.37% more execution time when compared with 
IGA_RLorr to mine the HUIs from accident_10% 
dataset. 

Figure 8 shows that the proposed 
IGA_RLon reduces the execution time required to 
mine the HUIs from mushrom dataset by 15.54% 
and 6.82% when compared with HUPEum-GRAM 
and HUIM-BPSO respectively. On the other side, 
IGA_RLon takes 4.68% more execution time when 
compared with IGA_RLorr to mine the HUIs from 
mushroom dataset. 

Figure 9 shows that the proposed 
IGA_RLon reduces the execution time required to 
mine the HUIs from connect dataset by 6.97% and 
4.71% when compared with HUPEum-GRAM and 
HUIM-BPSO respectively. On the other side, 
IGA_RLon takes 4.78% more execution time when 
compared with IGA_RLorr to mine the HUIs from 
mushroom dataset. 

4.2. Convergence Speed 

The convergence speed of IGA_RLon is measured 
using the dataset chess, accident_10%, mushroom 
and connect dataset from SPMF repository and 
compared with HUPEum-GRAM, HUIM-BPSO 
and IGA _RLorr. Graph is plotted by varying 


min_util along x-axis and No. of HUIs discovered 
along y-axis. 
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Convergence performance w.r.t variant of 
#Iterations - Chess Dataset 
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#Iterations - Connect Dataset 
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Figure 10 : Convergence speed for Chess dataset 
Figure 13 : Convergence speed for Connect dataset 


Convergence performance w.r.t variant of 
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Figure 10 to 13 infers that IGA_RLon converges at 
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4.3. Discovered HUIs 

Percent of discovered HUIs using IPSO_RLon from 
chess, accident_ 10%, mushroom and connect dataset 
is analyzed by comparing it with HUPEum-GRAM, 
HUIM-BPSO and IPSO_ RLon. Graph is plotted 


No. of HUIs 


ee ee using bar chart by varying min_ util along x-axis 
minimum utility threshold (min_util)}=13.0% an d % O f discovere d HUIs along y-axis. 


#Discovered HUIs w.r-t variant of minimum utility threshold 
Chess Dataset 


Figure I1 : Convergence speed for Accident_10% dataset 
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Figure 15: Percent of discovered HUIs from 
Accident_10% dataset 
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#Discovered HUIs w.t.t variant of minimum utility threshold 
Mushroom dataset 
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Figure 16 : Percent of discovered HUIs from Mushroom 
dataset 


Inference from the Figure 14 is that percent of 
discovered HUIs from chess dataset by IGA_RLon 
is improved by 29.85% and 15.35% when compared 
with HUPEum-GRAM, HUIM-BPSO. Also, percent 
of discovered HUIs by IGA_RLon is less by 29.10% 
when compared with IGA_RLorr. 

Figure 15 illustrates that, percent of 
discovered HUIs from accident 10% dataset by 
IGA_RLon is improved by 80.14% and 8.25% when 
compared with HUPEum-GRAM, HUIM-BPSO. 
Also, percent of discovered HUIs by IGA_RLon is 
less by 3.21% when compared with IGA_RLorr. 

Observation form the Figure 16 is that, 
percent of discovered HUIs from mushroom dataset 
by IGA_RLon is improved by 18.14% and 3.08% 
when compared with HUPEium-GRAM, HUIM- 
BPSO. Also, percent of discovered HUIs by 
IGA_RLon is less by 19.12% when compared with 
IGA_RLorr. 


% Discovered HUIs w.rt variant of minimum utility threshold 
Connect Dataset 
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Figure 17 : Percent of discovered HUIs from Connect 
dataset 


Observation form the Figure 17 is that, percent of 
discovered HUIs from connect dataset by 
IGA_RLon is improved by 79.87% and 9.42% when 
compared with HUPEum-GRAM, HUIM-BPSO. 
Also, percent of discovered HUIs by IGA_RLon is 
less by 5.15% when compared with IGA_RLorr. 
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5. CONCLUSION 


In the current research the fundamental architecture 
of SARSA, design methodology of the proposed 
IGA_RLon approach and its performance was 
explored. The experimental analysis gives 
conclusion that IGA_RLon performs better in terms 
of execution time, convergence speed and percent of 
discovered HUIs when compared with HUPEum- 
GRAM, HUIM-BPSO. 

The IGA_RLon algorithm achieves notable 
reductions in execution time, with decreases of 
12.52% and 4.05% compared to HUPEumu-GRAM 
and HUIM-BPSO respectively across various 
datasets. However, it exhibits a slight increase of 
6.74% in execution time compared to IGA_RLorr 
for mining HUIs. 


IGA_RLon demonstrates competitive 
convergence speed and generates a_ significant 
number of high utility itemsets compared to 
HUPEumu-GRAM and HUIM-BPSO. 
Nevertheless, it exhibits a_ slightly slower 
convergence rate in comparison to IGA_RLorr. 


IGA_RLon_ significantly improves the 
percentage of discovered High Utility Itemsets 
(HUIs) compared to HUPEumu-GRAM and HUIM- 
BPSO across datasets, showing enhancements of up 
to 80.14%. However, it exhibits a lower percentage 
of discovered HUIs compared to IGA_RLorr, with 
reductions of up to 29.10%. 

The proposed methodology leverages the 
synergy between GA and SARSA to enhance the 
efficiency and effectiveness of HUI mining. By 
integrating reinforcement learning principles into the 


genetic algorithm framework, IGA _RLon 
demonstrates remarkable performance 
improvements, paving the way for future 


advancements in this domain. 
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